sentiment classification reveal line attractor
Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics
Recurrent neural networks (RNNs) are a widely used tool for modeling sequential data, yet they are often treated as inscrutable black boxes. Given a trained recurrent network, we would like to reverse engineer it--to obtain a quantitative, interpretable description of how it solves a particular task. Even for simple tasks, a detailed understanding of how recurrent networks work, or a prescription for how to develop such an understanding, remains elusive. In this work, we use tools from dynamical systems analysis to reverse engineer recurrent networks trained to perform sentiment classification, a foundational natural language processing task. Given a trained network, we find fixed points of the recurrent dynamics and linearize the nonlinear system around these fixed points. Despite their theoretical capacity to implement complex, high-dimensional computations, we find that trained networks converge to highly interpretable, low-dimensional representations. In particular, the topological structure of the fixed points and corresponding linearized dynamics reveal an approximate line attractor within the RNN, which we can use to quantitatively understand how the RNN solves the sentiment analysis task. Finally, we find this mechanism present across RNN architectures (including LSTMs, GRUs, and vanilla RNNs) trained on multiple datasets, suggesting that our findings are not unique to a particular architecture or dataset. Overall, these results demonstrate that surprisingly universal and human interpretable computations can arise across a range of recurrent networks.
Reviews: Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics
UPDATE after reading author rebuttal: Look forward to the changes in the final version of the paper. Detailed comments: 1. Understanding of RNNs for sentiment classification task - theoretical analysis backed by empirical observations: This work takes up the sentiment classification task. This work figured out some fixed points and centered their analysis of RNNs around them. The RNN states can be cast into a 1-dimensional manifold of these fixed points. The PCA of RNN states across examples reveal that training helps RNNs figure out a lower-dimensional representation. Interestingly the movement along this low dimensional manifold is minimal in absence of inputs or presence of neutral/un-informative words, whereas they show more movements if polarity bearing words are present, thus, showing linear separability effects along this 1-D manifold.
Reviews: Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics
This paper provides insightful analysis into what decision processes are actually implemented by a trained recurrent network for sentiment classification, and uncover simple line attractor dynamics. All reviewers agree that this is interesting and illuminating, and that this work shows a good example of what can be done to open the black box of deep systems.
Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics
Recurrent neural networks (RNNs) are a widely used tool for modeling sequential data, yet they are often treated as inscrutable black boxes. Given a trained recurrent network, we would like to reverse engineer it--to obtain a quantitative, interpretable description of how it solves a particular task. Even for simple tasks, a detailed understanding of how recurrent networks work, or a prescription for how to develop such an understanding, remains elusive. In this work, we use tools from dynamical systems analysis to reverse engineer recurrent networks trained to perform sentiment classification, a foundational natural language processing task. Given a trained network, we find fixed points of the recurrent dynamics and linearize the nonlinear system around these fixed points.
Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics
Maheswaranathan, Niru, Williams, Alex, Golub, Matthew, Ganguli, Surya, Sussillo, David
Recurrent neural networks (RNNs) are a widely used tool for modeling sequential data, yet they are often treated as inscrutable black boxes. Given a trained recurrent network, we would like to reverse engineer it--to obtain a quantitative, interpretable description of how it solves a particular task. Even for simple tasks, a detailed understanding of how recurrent networks work, or a prescription for how to develop such an understanding, remains elusive. In this work, we use tools from dynamical systems analysis to reverse engineer recurrent networks trained to perform sentiment classification, a foundational natural language processing task. Given a trained network, we find fixed points of the recurrent dynamics and linearize the nonlinear system around these fixed points.